Convex Optimization: From Statistical Likelihood to Convex Programs

Statistical inference asks: "Given this data, what are the most likely underlying parameters?" This slide bridges that question with Convex Optimization. We transform the probabilistic notion of likelihood into a structured program, showing that under conditions of log-concavity, finding the best estimate is equivalent to solving a convex optimization problem.

The Likelihood Framework

The likelihood function is the probability distribution $p_x(y)$ considered as a function of the parameter $x$ for a fixed observed sample $y$. To estimate $x$, we employ Maximum Likelihood (ML) estimation: choosing the value that makes the observed data most probable.

$$\hat{x}_{ml} = \text{argmax}_x p_x(y) = \text{argmax}_x l(x)$$

For computational efficiency, we use the log-likelihood function, $l(x) = \log p_x(y)$. Because the logarithm is a monotonically increasing function, it preserves the location of the maximum while turning products (from independent observations) into easy-to-manage sums.

The MLE Optimization Program (7.1)

We formalize the estimation as a mathematical program:

$$\begin{array}{ll} \text{maximize} & l(x) = \log p_x(y) \\ \text{subject to} & x \in C \end{array}$$ (7.1)

This program is a convex optimization problem if:

The log-likelihood function $l$ is concave for each value of $y$.
The feasible set $C$ (prior information) is described by linear equality and convex inequality constraints.

Integrating Constraints and Priors

ML estimation requires redefining $p_x(y)$ to be zero for $x \notin C$ to explicitly impose physical or prior constraints. In the optimization space, this means the log-likelihood function is assigned the value $-\infty$ for parameters $x$ that violate these constraints, effectively creating an impassable barrier for the optimizer.

🎯 Core Principle

The transition from "Maximum Likelihood" to "Convex Program" relies on the concavity of the log-density. If the noise or distribution is log-concave, statistical estimation becomes a globally solvable optimization task.

QUESTION 1

Why is the log-likelihood function $l(x)$ preferred over the likelihood $p_x(y)$ for optimization?

It changes the location of the maximum to a more stable point.

It is a monotonically increasing function that transforms products into sums.

It ensures the problem is always linear.

It removes the need for constraints.

QUESTION 2

Under what conditions is the MLE problem (7.1) considered a convex optimization problem?

When $p_x(y)$ is a linear function of $x$.

When $l(x)$ is convex and $C$ is any set.

When $l(x)$ is concave and $C$ is defined by linear equalities and convex inequalities.

Only when the noise is Gaussian.

QUESTION 3

If a parameter $x$ violates a prior constraint ($x \notin C$), what value is assigned to the log-likelihood?

$+\infty$

$-\infty$

QUESTION 4

True or False: The MLE for a log-concave density with convex constraints always has a unique global maximum if it exists.

True

False

QUESTION 5

Consider an exponential distribution with parameter $\lambda$. If we know $\lambda \ge 5$ but the data suggests $\lambda = 2$, where will the constrained MLE be?

At $\lambda = 2$

At $\lambda = 5$

The problem has no solution.

At $\lambda = 0$